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use in human-rated spacecraft. 
Reduces development and upgrade costs. 
Lowers the need for new design work. 
Eliminates reliance on individual suppliers. 
Leverages larger knowledge base. 

- Minimizes schedule risk. 


s Problem? Hard to meet the high reliabil 
and fault tolerance requirements. 


- E.g. 10°9 failures/hour in ultra-dependable systems. 
- E.g. Crit-1, “fly-through” fault tolerance. 


- Studies for Orion showed purely COTS designs 
would result in poor reliability and undue expense. 


Often custom proprietary solutions are needed. 
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s But the inclusion of COTS technologies 
is becoming more feasible. 
- Greater availability of rad-tolerant components. 
- TMR (Maxwell SCS750), lock-step (ARM R85). 
- Ability to realize fault-containment regions. 
- Growing number of suppliers. 


gs NASA’s strategy for future spacecraft has 
heavily prioritized using COTS parts. 


- Includes launchers, landers, etc. 
s Multiple projects have explored realizing 
safety-critical systems using COTS. 


- Scalable Processor-Independent Design for 
Extended Reliability (SPIDER). 


- Heavy Lift Vehicle (HLV) Architecture Study. 
- Evolvable Mars Campaign (lander). 
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BYZ-1: All Faults 
PFH-3: Byzantine Omissive Symmetric 
Transmissive — Strictly Omissive Omissive Transmissive 
OTH-4: ; 
Asymmetric Asymmetric Symmetric Symmetric 
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Fault Classifications (cont.) 


BYZ-1: All Faults 


Different observers 
see a fault manifest 
in the same way. 


PFH-3: Byzantine Omissive Symmetric 
Transmissive — Strictly Omissive Omissive Transmissive 
OTH-4: . 
Asymmetric Asymmetric Symmetric Symmetric 
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Fault Classifications (cont.) 


BYZ-1: All Faults 


Different observers 
see a fault manifest 
in different ways. 


PFH-3: Byzantine Omissive Symmetric 
Transmissive Strictly Omissive Omissive Transmissive 
OTH-4: ; 
Asymmetric Asymmetric Symmetric symmetric 


SAE INTERNATIONAL Paper # 2017-01-2111 6/23 


- Possibility is considered low enough 
to not warrant additional complexity. 


- Impacts of faults are less severe 


- Especially for dynamic mission 
(e.g. not taking a picture). 


phases with short time to effect. 


- Higher number of “all-or-none” 
events (e.g. deploy parachutes). 


- Failure could result in loss of life. 
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Byzantine Faults 


s Byzantine faults can disrupt consensus 
among redundant processors. 
- E.g. on internal state information. 
- E.g. on sensor data. 
- E.g. on diagnosis of system faults. 


s Occur at rates much > 10° failures/hour. 
- Slightly-off-specification (SOS) hardware. 


- Stuck transmitter — different receivers can 
interpret a marginal signal differently. 


- Time base corruption — messages received 
slightly too early or too late. 
s Several architectural approaches for 
Byzantine-resilient systems. 
- Hierarchical — e.g. SAFEbus, Orion VMCs. 
- Full exchange — e.g. Draper FTMP, SPIDER. 
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es “Channelized bus” approach 
is common in launchers. 


- Each OBC can only access 
devices on its local bus. 


- Uses full exchanges. 

- Usually designed to be 1FT. 
ws Examples: 

- X-38 CRV, Ares |, Delta IV. 


s Shortcomings? Interstage | 


CCDL Interface | 


External time 


Cross-Channel Data Link (CCDL) 
reference 
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gs Shortcomings? 


1. Requires separate CCDL for data 
exchange between OBCs. 


Interstage 


CCDL Interface 


External time 
reference 
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Bus Interface Bus Interface 


OBC1 OBC2 


CCDL Interface CCDL Interface 


Cross-Channel Data Link (CCDL) 
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gs Shortcomings? 


1. Requires separate CCDL for data = COM — , coM2 — , 
exchange between OBCs. a 5 5 
2. Often requires external timing EUs BBY 8 ae 
hardware for synchronization. : = = 
RIUI = RIU2 = 
Bus Interface Bus Interface Bus Interface 
iaierstage OBC1 OBC2 OBC3 
CCDL Interface CCDL Interface CCDL Interface CCDL Interface 


Exp nab ime Cross-Channel Data Link (CCDL) 
reference 
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gs Shortcomings? 


1. Requires separate CCDL for data 
exchange between OBCs. 


2. Often requires external timing 
hardware for synchronization. 


3. Requires separate interstage to 
meet minimum number of FCRs. 


interstage 


CCDL Interface 


External time 
reference 


SAE INTERNATIONAL 


vy Jauueyyd sng 
vu 
oO 
Cc 
XS) 
! 
q Jauueyy sng 
9 jeuueyy sng 


Bus Interface Bus Interface Bus Interface 


OBC1 OBC2 OBC3 


CCDL Interface CCDL Interface CCDL Interface 


Cross-Channel Data Link (CCDL) 
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gs Shortcomings? 


1. Requires separate CCDL for data = COM — , COM2 — ,, 
exchange between OBCs. a o 5 
2. Often requires external timing EUs ee oe 
hardware for synchronization. = = = 
RIU1 Riu2 — 
3. Requires separate interstage to 
meet minimum number of FCRs. 
4. Requires two rounds of data 
ex ch an g e b etw een OB C S. Bus Interface Bus Interface Bus Interface 
OBC1 OBC2 OBC3 


Interstage 


External time 
reference 
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gs Shortcomings? 


1. Requires separate CCDL for data 
exchange between OBCs. 


2. Often requires external timing PDU1 PDU2 
hardware for synchronization. 


3. Requires separate interstage to 
meet minimum number of FCRs. 


4. Requires two rounds of data 
exchange between OBCs. 


5. Bandwidth limited. 


COM1 


RIU1 


Bus Interface Bus Interface 


OBC1 OBC2 


Interstage 


CCDL Interface CCDL Interface 


External time 
reference 
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PDU3 


RIU2 


Bus Interface 


OBC3 
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gs 1FT “switched voter’ using TTE. 
- Requires only 3 full processors. 
- Requires 2-3 redundant switches. 


Pr | e P PF 


: 
- Devices can connect to OBCs _|} __Treswiens | 
| | rt | J aT 
directly or via TTE network. | TTE Switch 
- Assumes minimum number of SMs CI — Switch 1 


and CMs are present for sync. 4 


s TTE network used for data 
distribution and sync. 
- Eliminates need for separate CCDL. 
- Eliminates need for timing hardware. 


- Bandwidth up to 1 Gbit/s. "Bus Interface 


UW 


gs Switches act as interstages. 
- Messages reflected to/from the switches. 
- Eliminates need for fourth processor. 


z 
a= 
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Failure Assumptions 


3x RS232/ 
SPI MDIO JTAG 422/485 


oes ean arbitrary ifessages: 


- May transmit at any point in time. ~~ = 
- May send different messages to  enctions 


different switches. 


AHB/APB 


inc on: 
| May not create (nor ane ae prot 
a new “valid” message. 

- May drop or fail to receive an arbitrary 


TTE-Switch TTE-Switch 
(eeyit age) (tg Controller 
MON COM 


Ethernet Ports 


number of messages. Sw Ports 
. 6xRMGII/RMII + 12xRMII 
- May relay messages asymmetrically — 
some receivers may not get data. Fault propagation from switches theoretically 
- Acts as a ‘trusted sender’. requires dual-correlated simultaneous faults. 


> 10° x10° = ~10°' failures/hour 
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(2) A fault causes 
OBC1 to senda K 
bad value to SW1. 


o1 
ol 


| want to 
share 5 


OBC1 reads a value 
of 5 on its local bus. 
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sw sw2 SW3 


© Each switch 
relays the data 
to all OBCs. 


4) Each OBC votes 
the values sent 
from the switches. 


Absent data is not Bus Interface | Bus Interface | Bus Interface | 


included in the vote. 


G@ Vote could be 
implemented 
in TTE NIC or 
in software on 
the OBCs. 


K, 5, 5 K, 5,5 K, 5,5 
Final: 5 Final: 5 Final: 5 
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Agreement on External Data 


4) A fault causes different 
values to be sent to 
each switch. 


(2) Each switch 
relays the data 


to all OBCs. 
TTE NIC TTE NIC TTE NIC 
All OBCs agree 
that no majority OBC1 OBC2 wanes OBCN 
is found. 
5, K, 5, K, 5, K, _ 
Final: @ Final: @ Final: @ 
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@ A fault causes 
OBC2 to send a 
bad value to SW3. 
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@ Each Riv either: 
1. Accepts the first valid 
value from each OBC. 
or 
2. Votes the redundant 
values from each OBC. 


RIU1 | 
_ ‘TTENIC © 


RIU2 | 
 ‘TTENIC C) 


4) Each RIU votes 
the values accepted 
In Step 3. 
Absent data is 
included in the vote. 


Each switch ae | Sate 
relays the data aii CY ald shale 
from each OBC 5,5,5 55,5 5, K, 5 


to all RlUs. 
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Happening Simultaneously ... 
RIU1 | RIU2 | 


TTE NIC TTE NIC 
came, 


5) Each switch reflects 
the original data 
back to all OBCs. 5, 9,9 9,359 5, K, 5 


sw SW2 SW3 


6) Each OBC votes the 
redundant values @ Each OBC votes 
the results from 


from each OBC. ,; 
Absent data is not Step 6 to diagnose 
faulty OBCs. 


included in the vote. 
Absent data is 


included in the vote. 
OBC2 
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Questions? 
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